Goal :
The objective of this project is to identify and analyze key factors contributing to road traffic accidents, with the aim of understanding patterns related to road conditions, weather, and time of the day.
About Dataset :
The dataset was gotten from kaggle and it contains 2 tables which include the RTA data which is the dataset before preprocessing and the cleaned (preprocessed) version of the data, we'll be making use of the cleaned data in our analysis. The dataset contains manual records of road traffic accidents of Addis Ababa City, Ethiopia of the year 2017 - 20 collected from the Sub city Police departments for Masters Research work.
Click here to check out the dataset : Traffic Data
import pandas as pd
import plotly.express as px
df = pd.read_csv("C:/Users/obalabi adepoju/Documents/traffic.csv")
We'll look at a general overview of our data and a description of each column.
df.head(10)
| Age_band_of_driver | Sex_of_driver | Educational_level | Vehicle_driver_relation | Driving_experience | Lanes_or_Medians | Types_of_Junction | Road_surface_type | Light_conditions | Weather_conditions | Type_of_collision | Vehicle_movement | Pedestrian_movement | Cause_of_accident | Accident_severity | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 18-30 | Male | Above high school | Employee | 1-2yr | Unknown | No junction | Asphalt roads | Daylight | Normal | Collision with roadside-parked vehicles | Going straight | Not a Pedestrian | Moving Backward | 2 |
| 1 | 31-50 | Male | Junior high school | Employee | Above 10yr | Undivided Two way | No junction | Asphalt roads | Daylight | Normal | Vehicle with vehicle collision | Going straight | Not a Pedestrian | Overtaking | 2 |
| 2 | 18-30 | Male | Junior high school | Employee | 1-2yr | other | No junction | Asphalt roads | Daylight | Normal | Collision with roadside objects | Going straight | Not a Pedestrian | Changing lane to the left | 1 |
| 3 | 18-30 | Male | Junior high school | Employee | 5-10yr | other | Y Shape | Earth roads | Darkness - lights lit | Normal | Vehicle with vehicle collision | Going straight | Not a Pedestrian | Changing lane to the right | 2 |
| 4 | 18-30 | Male | Junior high school | Employee | 2-5yr | other | Y Shape | Asphalt roads | Darkness - lights lit | Normal | Vehicle with vehicle collision | Going straight | Not a Pedestrian | Overtaking | 2 |
| 5 | 31-50 | Male | Unknown | Unknown | Unknown | Unknown | Y Shape | Unknown | Daylight | Normal | Vehicle with vehicle collision | U-Turn | Not a Pedestrian | Overloading | 2 |
| 6 | 18-30 | Male | Junior high school | Employee | 2-5yr | Undivided Two way | Crossing | Unknown | Daylight | Normal | Vehicle with vehicle collision | Moving Backward | Not a Pedestrian | Other | 2 |
| 7 | 18-30 | Male | Junior high school | Employee | 2-5yr | other | Y Shape | Asphalt roads | Daylight | Normal | Vehicle with vehicle collision | U-Turn | Not a Pedestrian | No priority to vehicle | 2 |
| 8 | 18-30 | Male | Junior high school | Employee | Above 10yr | other | Y Shape | Earth roads | Daylight | Normal | Collision with roadside-parked vehicles | Going straight | Crossing from driver's nearside | Changing lane to the right | 2 |
| 9 | 18-30 | Male | Junior high school | Employee | 1-2yr | Undivided Two way | Y Shape | Asphalt roads | Daylight | Normal | Collision with roadside-parked vehicles | U-Turn | Not a Pedestrian | Moving Backward | 1 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 12316 entries, 0 to 12315 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Age_band_of_driver 12316 non-null object 1 Sex_of_driver 12316 non-null object 2 Educational_level 12316 non-null object 3 Vehicle_driver_relation 12316 non-null object 4 Driving_experience 12316 non-null object 5 Lanes_or_Medians 12316 non-null object 6 Types_of_Junction 12316 non-null object 7 Road_surface_type 12316 non-null object 8 Light_conditions 12316 non-null object 9 Weather_conditions 12316 non-null object 10 Type_of_collision 12316 non-null object 11 Vehicle_movement 12316 non-null object 12 Pedestrian_movement 12316 non-null object 13 Cause_of_accident 12316 non-null object 14 Accident_severity 12316 non-null int64 dtypes: int64(1), object(14) memory usage: 1.4+ MB
print(f"This dataset contains {df.shape[0]} columns and {df.shape[1]} rows")
This dataset contains 12316 columns and 15 rows
Here are the descriptions of each column in the dataset:
Age_band_of_driver: The age group of the driver involved in the accident (e.g., 18-30, 31-50).
Sex_of_driver: The gender of the driver involved in the accident (e.g., Male, Female).
Educational_level: The highest level of education attained by the driver (e.g., Above high school, Junior high school).
Vehicle_driver_relation: The relationship between the driver and the vehicle (e.g., Employee, Owner).
Driving_experience: The number of years the driver has been driving (e.g., 1-2yr, Above 10yr).
Lanes_or_Medians: The type of road or median where the accident occurred (e.g., Undivided Two way, other).
Types_of_Junction: The type of junction where the accident took place (e.g., No junction, Y Shape).
Road_surface_type: The type of road surface at the accident location (e.g., Asphalt roads, Earth roads).
Light_conditions: The lighting conditions at the time of the accident (e.g., Daylight, Darkness - lights lit).
Weather_conditions: The weather conditions during the accident (e.g., Normal, Rainy).
Type_of_collision: The nature of the collision (e.g., Collision with roadside-parked vehicles, Vehicle with vehicle collision).
Vehicle_movement: The movement of the vehicle(s) involved in the accident (e.g., Going straight, Turning).
Pedestrian_movement: The movement of pedestrians involved in the accident, if any (e.g., Not a Pedestrian).
Cause_of_accident: The primary cause or contributing factor of the accident (e.g., Overtaking, Changing lane).
Accident_severity: The severity of the accident, usually categorized by the level of damage or injury (e.g., Slight, Fatal).
#We'll start off with renaming some of our columns for ease when analyzing
df.rename(columns={'Age_band_of_driver':"age",'Sex_of_driver':"sex",'Educational_level':"education",
'Vehicle_driver_relation':"relation",'Driving_experience':"experience",'Lanes_or_Medians':"L/M",
'Road_surface_type':"surface",'Types_of_Junction':"tjunction",'Light_conditions':"light",
'Weather_conditions':"weather",'Type_of_collision':"tcollision",'Vehicle_movement':"vmovement",
'Pedestrian_movement':"pmovement",'Cause_of_accident':'cause','Accident_severity':'Severity'},inplace = True)
We'll be going through the columns to gain an even deeper undertanding of our dataset.
We'll be starting with our Age column.
#I'll replace each each range to a category to make things easier
#so let's check out our ranges
df['age'].value_counts()
age 18-30 4271 31-50 4087 Over 51 1585 Unknown 1548 Under 18 825 Name: count, dtype: int64
# With that we can now map our data to its categories
df.age = df.age.replace({'18-30' : "Young Adults",
'31-50' : "Older Adults",
'Over 51' : 'Elderly',
"Under 18" : 'Child'})
fig = px.histogram(df, x = 'age', title = 'Age Distribution', color_discrete_sequence = ['blue'])
fig.show()
df.sex.value_counts()
sex Male 11437 Female 701 Unknown 178 Name: count, dtype: int64
''' Due to the few number of unknowns present and the sheer number of the male populace,
it's safe to replace our unknowns with the male category '''
df.sex.replace(['Unknown'],df.sex.mode(),inplace=True)
fig = px.histogram(df, x = 'sex', title = 'Gender Distribution',
color = 'sex',color_discrete_map = {'Male':'blue','Female':'grey'})
fig.show()
df.education.unique()
array(['Above high school', 'Junior high school', 'Unknown',
'Elementary school', 'High school', 'Illiterate',
'Writing & reading'], dtype=object)
df.education.value_counts()
education Junior high school 7619 Elementary school 2163 High school 1110 Unknown 841 Above high school 362 Writing & reading 176 Illiterate 45 Name: count, dtype: int64
Interestingly, most individauls involved in road accidents seem to have only been able to achieve middle school certificates with this group alone accounting for over 60 % of our data. Looking at what we have above, it's reasonable to imply those who didn't attain a higher educational certificate are prone to accidents more than others perhaps due to foundational or basic knowledge of roads and driving as our top 3 are those with either a high school certificate or less with the 3 groups occupying about 88.4 % of the entire data.
df.relation.value_counts()
relation Employee 9627 Owner 1973 Unknown 593 Other 123 Name: count, dtype: int64
We've gone through a few of our columns and we see our unknowns are unavoidable so we'll just be treating it together with our Here's a refined version of your statement:
df.experience.value_counts()
experience 5-10yr 3363 2-5yr 2613 Above 10yr 2262 1-2yr 1756 Below 1yr 1342 Unknown 829 No Licence 118 unknown 33 Name: count, dtype: int64
df.experience = df.experience.replace({'5-10yr':4,'2-5yr':3,'Above 10yr':5,'1-2yr':2,
'Below 1yr':1,'unknown':'Unknown','No License':'Unknown'})
fig = px.violin(df, x = 'experience', title = 'Age Distribution', color_discrete_sequence = ['blue'])
fig.show()
df['L/M'].value_counts()
L/M Two-way (divided with broken lines road marking) 4411 Undivided Two way 3796 other 1660 Double carriageway (median) 1020 One way 845 Unknown 442 Two-way (divided with solid lines road marking) 142 Name: count, dtype: int64
A short description of our categories.
df['tjunction'].value_counts()
tjunction Y Shape 4543 No junction 3837 Crossing 2177 Unknown 1078 Other 445 O Shape 164 T Shape 60 X Shape 12 Name: count, dtype: int64
#Because crossing and X junctions typically have the same definition, we'll add them together
df['tjunction'] = df['tjunction'].replace({'X Shape': 'Crossing'})
For those not very familiar with road termninologies, here is a short description to broaden your level of understanding: The dataset includes the following types of junctions:
No Junction: The accident occurred on a road section without any junction or intersection.
Y Shape: A junction where one road splits into two, forming a "Y" shape.
Crossing: An intersection where roads cross each other, typically at right angles (similar to an "X" shape).
O Shape: A circular junction, likely a roundabout, where traffic moves in one direction around a central island.
T Shape: A "T" junction where one road ends at a perpendicular intersection with another road.
- Multiple Points of Conflict: A Y-junction has multiple points where vehicles can intersect. Unlike a straight road or a simple T-junction, vehicles at a Y-junction may approach from different angles, creating more opportunities for collisions.
- Visibility Issues: The angles at Y-junctions can sometimes create blind spots or reduce visibility for drivers, making it harder to see oncoming traffic, especially when turning.
- Turning Movements: At a Y-junction, vehicles often need to make sharp turns, either merging into or crossing traffic. These maneuvers increase the risk of accidents, particularly if drivers misjudge the speed or distance of oncoming vehicles.
These are just some of the reasons Y junctions ae most frequent in our data.
df['light'].value_counts()
light Daylight 8798 Darkness - lights lit 3286 Darkness - no lighting 192 Darkness - lights unlit 40 Name: count, dtype: int64
# Let's check out this values in a bar chart
fig = px.histogram(df, x='light', title='Distibution of Light Conditions')
# Update the bar color
fig.update_traces(marker_color='mediumseagreen',marker_line_color='black',marker_line_width=0.5)
# Update the layout to ensure the background is distinct
fig.update_layout(
title_font=dict(size=20, color='black')
)
# Show the figure
fig.show()
I would have hypothesized that less light would lead to more accidents but taht wouldn't have factored in the fact that these light categories represent different times of the day and there tends to be less vehicles on the road as the day goes by, perhaps that would explain the decreasing number of accidents as light dims out with number of vehicles at its peak during the daytime.
df['weather'].value_counts()
weather Normal 10063 Raining 1331 Other 296 Unknown 292 Cloudy 125 Windy 98 Snow 61 Raining and Windy 40 Fog or mist 10 Name: count, dtype: int64
#Due to the considerable difference between the values, well replace our unknowns with the mode
df['weather'] = df['weather'].replace({'Unknown':'Normal'})
More than 80 % of accidents occured on a normal bright and sunny day meaning weather conditions didn't play a very huge factor in accidents but we do see the next most weather condition is when it's raining and based on the disparity between its numbers and forthcoming conditions, we can say a little bad weather if only slight contributes to accidents.
Another thing to note is the incredibly low numbers amongst certain conditions. Well this is no shock as no one wants to drive on icy roads when it's snowing or during a storm which is even worse than a typical rain and fogs or mist are typically associated with very early mornings where less cars are present.
df.vmovement.value_counts()
vmovement Going straight 8158 Moving Backward 985 Other 937 Reversing 563 Turnover 489 Unknown 396 Getting off 339 Entering a junction 193 Overtaking 96 Stopping 61 U-Turn 50 Waiting to go 39 Parked 10 Name: count, dtype: int64
# Since reversing consists of the same motions as moving backwards, we've decided to join these two together
df['vmovement'] = df['vmovement'].replace({'Reversing':'Moving Backward'})
# Let's check out this values in a histogram
fig = px.histogram(df,'vmovement',title = 'Movement Distribution',color = 'vmovement',
color_discrete_map = {'Going straight':'blue'},color_discrete_sequence=['gray'])
fig.update_traces(marker_line_color='black',marker_line_width=0.5)
fig.update_layout(
xaxis_title="Vehicle Movement", # Specify x-axis title
yaxis_title="", # Specify y-axis title
showlegend=False, # Hide legend
title_font=dict(size=20, color='black')
)
fig.show()
df.tcollision.value_counts()
tcollision Vehicle with vehicle collision 8774 Collision with roadside objects 1786 Collision with pedestrians 896 Rollover 397 Collision with animals 171 Unknown 169 Collision with roadside-parked vehicles 54 Fall from vehicles 34 Other 26 With Train 9 Name: count, dtype: int64
# Let's check out its distribution
fig = px.histogram(df,y='tcollision',title = 'Type of Collision',
category_orders={'tcollision': df['tcollision'].value_counts().index})
fig.update_traces(marker_line_color = 'black')
fig.update_layout(
xaxis_title="",
yaxis_title="Type of Collision",
showlegend=False, # Hide legend # Remove x-axis grid lines
yaxis=dict(showgrid=False),# Background color of the plotting area
title_font=dict(size=20, color='black')
)
fig.show()
The vast majority of collisions involve vehicles colliding with other vehicles (8,774 incidents), followed by collisions with roadside objects (1,786) and unfortunately, pedestrians (896). Collisions with roadside-parked vehicles (54) are less frequent enforcing what we have previously seen which indicates parked vehicles are involved in the least amount of accidents although not completely off the hook which isn't very surprising as there has been an increase of "maniacs" on the road these last few decades.
df.cause.value_counts()
cause No distancing 2263 Changing lane to the right 1808 Changing lane to the left 1473 Driving carelessly 1402 No priority to vehicle 1207 Moving Backward 1137 No priority to pedestrian 721 Other 456 Overtaking 430 Driving under the influence of drugs 340 Driving to the left 284 Getting off the vehicle improperly 197 Driving at high speed 174 Overturning 149 Turnover 78 Overspeed 61 Overloading 59 Drunk driving 27 Unknown 25 Improper parking 25 Name: count, dtype: int64
If you're questioning why overspeeding and high speeds aren't the same, it is because:
- Driving at high speed: This generally refers to driving at speeds significantly above the average for the road conditions or traffic flow but may not necessarily exceed the legal speed limit. It indicates excessive speed relative to the norm.
- Overspeeding: This specifically means driving faster than the posted speed limit. It is a more precise term indicating that the speed exceeds legal regulations, regardless of the driving conditions.
Initially we see the leading causes of accidents is drivers not leaving enough distance in between cars but it's worthy to note that changing lanes to the right or left also have significant numbers so we might say the leading cause of accidents in addis abbaba city is due to changing lanes, changing to the right happens to be more frequent because in Ethiopia, traffic drives on the right side of the road. This means that drivers typically change lanes to the right for slower traffic or when preparing to exit or turn right, similar to other countries with right-hand traffic.
Next we'll combine these two categories together
df['cause'] = df['cause'].replace({'Changing lane to the left':'Changing lanes','Changing lane to the right':"Changing lanes"})
df['cause'].value_counts()
cause Changing lanes 3281 No distancing 2263 Driving carelessly 1402 No priority to vehicle 1207 Moving Backward 1137 No priority to pedestrian 721 Other 456 Overtaking 430 Driving under the influence of drugs 340 Driving to the left 284 Getting off the vehicle improperly 197 Driving at high speed 174 Overturning 149 Turnover 78 Overspeed 61 Overloading 59 Drunk driving 27 Unknown 25 Improper parking 25 Name: count, dtype: int64
Now we have the major cause of accidents involve `Changing Lanes` followed closely by no distancing between vehicles which can lead to vehicle with vehicle collision if the vehicle in front suddenly comes to a halt or the one at the back doesn't stop the brake in time whilst moving. Another important cause is reckless drving and although it's a category on it's own, I'd like to highlight giving no priority to vehicles or pedestrians also involves lack of care which tells us that absence of caution and care is a huge factor in accident causes. The last most notable cause involves vehicle movement i.e `Moving Backward` and finally another instance of parking being the least common in our column.
First and foremost, we'll transform our data from numeric values to string, we've gone over some metadata for our dataset so we know thw correct classifications to replace it with.
df['Severity'] = df['Severity'].replace({2:'Slight',1:'Serious',0:"Fatal"})
df.Severity.value_counts()
Severity Slight 10415 Serious 1743 Fatal 158 Name: count, dtype: int64
# Let's check out this values in a histogram
fig = px.pie(df,'Severity',title = 'Accident Severity Distribution',color_discrete_sequence=['mediumpurple'],hole=0.5)
fig.update_traces(marker_line_color='black',marker_line_width=0.5)
fig.show()
For this next part of our project, we're going to focus on answering the following couple of questions:
def crossdf(col):
"""
Return a pandas crosstab for the given column against the target variable
"""
crossdf = pd.crosstab(df['Severity'], df[col], normalize='index')
crossdf = crossdf.reset_index()
return crossdf
def melt(data,col) :
if 'Unknown' in data :
data.drop(columns='Unknown',inplace=True)
df_melted = data.melt(id_vars='Severity', var_name=col, value_name='Proportion')
return df_melted
def viz(data,col):
fig = px.histogram(data, x='Severity', y='Proportion', color=col,
title='Proportion of Accidents by Severity and '+col,barmode = 'group')
fig.update_traces(marker_line_color='black',marker_line_width=0.5)
# Update the layout to ensure the background is distinct
fig.update_layout(
title_font=dict(size=20, color='black'),
yaxis = dict(range = [0,1]),
yaxis_title = 'Proportion'
)
fig.show()
'''We are going to create a cross tab for our different categories for each column,
to make use of their proportion so we can examine our dataset a little more statistically.
'''
df1 = crossdf('sex')
df1
| sex | Severity | Female | Male |
|---|---|---|---|
| 0 | Fatal | 0.031646 | 0.968354 |
| 1 | Serious | 0.059667 | 0.940333 |
| 2 | Slight | 0.056841 | 0.943159 |
data = melt(df1,'Sex')
data
| Severity | Sex | Proportion | |
|---|---|---|---|
| 0 | Fatal | Female | 0.031646 |
| 1 | Serious | Female | 0.059667 |
| 2 | Slight | Female | 0.056841 |
| 3 | Fatal | Male | 0.968354 |
| 4 | Serious | Male | 0.940333 |
| 5 | Slight | Male | 0.943159 |
viz(data,'Sex')
df.relation = df.relation.replace({'Other':'Unknown'})
df1 = crossdf('relation')
df1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3 entries, 0 to 2 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Severity 3 non-null object 1 Employee 3 non-null float64 2 Owner 3 non-null float64 3 Unknown 3 non-null float64 dtypes: float64(3), object(1) memory usage: 228.0+ bytes
data = melt(df1,'Relation')
data
| Severity | Relation | Proportion | |
|---|---|---|---|
| 0 | Fatal | Employee | 0.721519 |
| 1 | Serious | Employee | 0.778543 |
| 2 | Slight | Employee | 0.783101 |
| 3 | Fatal | Owner | 0.215190 |
| 4 | Serious | Owner | 0.164659 |
| 5 | Slight | Owner | 0.158617 |
viz(data,'Relation')
df.experience = df.experience.replace({4:'5-10yr',3:'2-5yr',5:'Above 10yr',2:'1-2yr',
1:'Below 1yr'})
df1 = crossdf('experience')
df1 = crossdf('experience')
data = melt(df1,'Experience')
data
| Severity | Experience | Proportion | |
|---|---|---|---|
| 0 | Fatal | 1-2yr | 0.132911 |
| 1 | Serious | 1-2yr | 0.130809 |
| 2 | Slight | 1-2yr | 0.144695 |
| 3 | Fatal | 2-5yr | 0.291139 |
| 4 | Serious | 2-5yr | 0.218589 |
| 5 | Slight | 2-5yr | 0.209890 |
| 6 | Fatal | 5-10yr | 0.259494 |
| 7 | Serious | 5-10yr | 0.265060 |
| 8 | Slight | 5-10yr | 0.274604 |
| 9 | Fatal | Above 10yr | 0.183544 |
| 10 | Serious | Above 10yr | 0.185313 |
| 11 | Slight | Above 10yr | 0.183389 |
| 12 | Fatal | Below 1yr | 0.044304 |
| 13 | Serious | Below 1yr | 0.118761 |
| 14 | Slight | Below 1yr | 0.108305 |
| 15 | Fatal | No Licence | 0.000000 |
| 16 | Serious | No Licence | 0.007458 |
| 17 | Slight | No Licence | 0.010082 |
viz(data,'Experience')
df.light = df.light.replace({'Darkness - lights unlit':'Darkness','Darkness - no lighting':'Darkness',
'Darkness - lights lit':'Darkness'})
df1 = crossdf('light')
df1 = crossdf('light')
data = melt(df1,'Light Condition')
data
| Severity | Light Condition | Proportion | |
|---|---|---|---|
| 0 | Fatal | Darkness | 0.449367 |
| 1 | Serious | Darkness | 0.298910 |
| 2 | Slight | Darkness | 0.280941 |
| 3 | Fatal | Daylight | 0.550633 |
| 4 | Serious | Daylight | 0.701090 |
| 5 | Slight | Daylight | 0.719059 |
viz(data,'Light Condition')
df1 = crossdf('tcollision')
df1
| tcollision | Severity | Collision with animals | Collision with pedestrians | Collision with roadside objects | Collision with roadside-parked vehicles | Fall from vehicles | Other | Rollover | Unknown | Vehicle with vehicle collision | With Train |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Fatal | 0.012658 | 0.139241 | 0.151899 | 0.000000 | 0.000000 | 0.000000 | 0.025316 | 0.012658 | 0.658228 | 0.000000 |
| 1 | Serious | 0.015491 | 0.080895 | 0.156053 | 0.002869 | 0.002295 | 0.001721 | 0.030981 | 0.018359 | 0.690189 | 0.001147 |
| 2 | Slight | 0.013634 | 0.070379 | 0.143063 | 0.004705 | 0.002880 | 0.002208 | 0.032549 | 0.012962 | 0.716947 | 0.000672 |
df1.drop(columns=['Other'],inplace=True)
data = melt(df1,'Type of Collision')
data
| Severity | Type of Collision | Proportion | |
|---|---|---|---|
| 0 | Fatal | Collision with animals | 0.012658 |
| 1 | Serious | Collision with animals | 0.015491 |
| 2 | Slight | Collision with animals | 0.013634 |
| 3 | Fatal | Collision with pedestrians | 0.139241 |
| 4 | Serious | Collision with pedestrians | 0.080895 |
| 5 | Slight | Collision with pedestrians | 0.070379 |
| 6 | Fatal | Collision with roadside objects | 0.151899 |
| 7 | Serious | Collision with roadside objects | 0.156053 |
| 8 | Slight | Collision with roadside objects | 0.143063 |
| 9 | Fatal | Collision with roadside-parked vehicles | 0.000000 |
| 10 | Serious | Collision with roadside-parked vehicles | 0.002869 |
| 11 | Slight | Collision with roadside-parked vehicles | 0.004705 |
| 12 | Fatal | Fall from vehicles | 0.000000 |
| 13 | Serious | Fall from vehicles | 0.002295 |
| 14 | Slight | Fall from vehicles | 0.002880 |
| 15 | Fatal | Rollover | 0.025316 |
| 16 | Serious | Rollover | 0.030981 |
| 17 | Slight | Rollover | 0.032549 |
| 18 | Fatal | Vehicle with vehicle collision | 0.658228 |
| 19 | Serious | Vehicle with vehicle collision | 0.690189 |
| 20 | Slight | Vehicle with vehicle collision | 0.716947 |
| 21 | Fatal | With Train | 0.000000 |
| 22 | Serious | With Train | 0.001147 |
| 23 | Slight | With Train | 0.000672 |
viz(data,'Type of Collision')
Majority of Accidents by Road Type: Most accidents occur on dual-lane roads with broken or shabby line partitions and undivided two-way lanes, accounting for about 66% of the data.
Vehicle Relation and Accident Severity: For employees, the proportion involved in accidents generally follows the overall trend, with a decrease from 78% to 72% as severity rises, indicating they are involved in more accidents than other categories but less so in severe ones. In contrast, vehicle owners show an increasing proportion with higher severity.
Experience and Accident Severity: The distribution of accident severity generally mirrors experience levels, with 5-10 years of experience being most common. Notably, 2-5 years of experience is prevalent in fatal accidents, suggesting that fewer years of experience may increase the severity of accidents.
Time of Day and Severity: The proportion of accidents occurring during the day decreases with increasing severity, while accidents in darkness increase significantly as severity rises.
Collision Types and Severity: Vehicle collisions with roadside objects, vehicles, and pedestrians show a balanced distribution across severity levels. Collisions with roadside-parked vehicles, trains, and falls from vehicles are not present in fatal accidents, indicating these types are less likely to result in fatalities.